Via 313 word cloud

Via 313 word cloud

 

Yelp.com has been a directory and review go-to for many years. A four-star app in both Google Play and the Apple Store, the company claims to be home to over 135 million reviews.

From the perspective of the consumer, positive reviews can lead to increased interest and patronage. The human phenomenon known as FOMO (the Fear Of Missing Out) keeps us alert of what others in our network enjoy taking part in and give us a kind of social incentive for keeping with the group. In contrast, poor reviews often signal unpleasant experiences from users that we may want to account for when choosing where to spend our money.

From the perspective of the business owner, online reviews have sort of become the digitized “word-of-mouth”. Naturally, the internet has made first-hand experiences with any product or company abundantly available on the web. With recent studies showing 95% of shoppers read online reviews before making a purchase, it’s no wonder services like Yelp are becoming increasingly important in customer experience analysis. Any successful business must inspire loyalty in its customers; to do that we must find a way to tap into the emotions our services evoke in our customers and the language they use when describing our business.

Business Challenge

As a business owner, how can I get a snapshot of what my customers are saying on Yelp.com?

In this analysis, I try to provide a data-mining alternative to the manual, labor-intensive process of reading every review. Of course, there is a time for engaging with your customers and addressing their experiences online and in person. However, a primary goal of business intelligence is to bring together insightful and actionable information for key decision makers. To that end, this project aims to give an overall sense of reviews from Yelp.com for a pizza chain in Austin, TX.

 

Data Challenge

Mining Messy Yelp.com Data for Sentiment and Term Frequency

As a broke college student, I must admit that pizza is a go-to for my colleagues and I. Yelp can help navigate the price points and experiences of competitive pizzerias in the area. For this customer experience analysis, I will borrow the URL of Yelp.com reviews for Via 313 Pizza in Austin, TX. I will then attempt to clean the data from multiple pages of reviews to construct a Document-Term-Matrix that will be used to provide us with text-based insight.

# Libraries
library(XML)
library(RCurl)
library(ggplot2)
library(syuzhet)
library(tm)
library(SnowballC)
library(wordcloud2)

# Yelp pizzeria URLs
viaURL <- 'https://www.yelp.com/biz/via-313-pizza-north-campus-austin' # 487 Reviews

Cleaning Text Data

Since I will be using the pizzeria comments from Yelp.com, I can use RCurl and XML to scrape the URLs for reviews and then clean their syntax for a more readable format that can be easily measured. The reviews are returned in lower case words with no numbers or punctuation. Here is an example of the first few cleaned reviews.

# Set URL
url <- viaURL

# Create empty data frame for reviews
reviewsDF <- data.frame()

# Get data and convert to data frame
reviewsPage <- getURL(url)
parsedReviews <- htmlParse(reviewsPage)
reviews <- xpathSApply(parsedReviews, '//div[@itemtype="http://schema.org/Review"]', xmlValue) 
reviewsDF <- data.frame(reviews)

# Set number of additional pages for more reviews (20 per page)
for (i in 1:2) {
  # Set URL
  url <- paste('https://www.yelp.com/biz/via-313-pizza-north-campus-austin?start=', i*20, sep="")
  reviewsPage <- getURL(url)
  parsedReviews<-htmlParse(reviewsPage)
  reviews <- xpathSApply(parsedReviews, '//div[@itemtype="http://schema.org/Review"]', xmlValue)
  reviews <- data.frame(reviews)
  reviewsDF <- rbind(reviewsDF, reviews)
}

# catch.error() function to check and convert to lower case
catch.error <- function(x)
{
  y <- NA
  catch_error <- tryCatch(tolower(x), error=function(e) e)
  if (!inherits(catch_error, "error"))
    y <- tolower(x)
  return(y)
}

# cleanReviews() function removes superfluous characters
cleanReviews <- function(review){
  review = gsub("\n", " ", review)
  review = gsub("[[:punct:]]", " ", review)
  review = gsub("[[:digit:]]", " ", review)
  review = iconv(review, "UTF-8", "ascii",sub='')
  review = catch.error(review)
  review
}

# cleanReviewsAndRemoveNAs() function removes NA or duplicates
cleanReviewsAndRemoveNAs <- function (allReviews) {
  allReviewsCleaned <- sapply(allReviews, cleanReviews)
  allReviewsCleaned <- allReviewsCleaned[!is.na(allReviewsCleaned)]
  names(allReviewsCleaned) = NULL
  allReviewsCleaned <- unique(allReviewsCleaned)
  allReviewsCleaned
}

# Clean the reviews
reviewsCleaned <- cleanReviewsAndRemoveNAs(reviewsDF)
head(reviewsCleaned)
[1] "                               this was first time experiencing detroit style pizza  i ve had chicago style  new york style  and papa murphy s    let me tell you  this pizza blows them all out of the park  the crispy  buttery sides make this pizza complete   the ambassador bridge is one of the best pizzas i ve ever had  and this place is so cool because they allow you to do a half and half pizza  my other half was the       pineapple does belong on pizza  with jalapenos   other than the pizza being incredible  the customer service was out of this world  every time my family is in town  i ve taken them their and were treated with a taste of a dessert or a sample of an appetizer  they even gave me a birthday card on my birthday  who does that   easily one of my favorite places in austin      "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
[2] "                               fair warning  i am a pineapple jalapeno kinda gal   i have been to detroit quite a few times now and that is probably the only reason i did not think this place it was as great as many others  but it was pretty great for detroit style pizza not in detroit   i will say the reason i thought it was not up to par for me was because it was a little too saucy and a little too cheesy  don t judge   the only reason that was a problem for me was because i love detroit style pizza for the crust and too much cheese and sauce will take away from the crunch of the pizza  i also was not a fan of how the pineapples were rounds  i understand the aesthetic was better but i did not get a piece of pineapple in each bite  which was a real bummer  the whole pizza was kinda falling apart  but the flavors of the crust and sauce were definitely up to par      "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
[3] "                               detroit style pizza in austin  now we re talking    via     s rise in the city by providing delicious square pizzas has catapulted them to new heights  they ve managed to open several locations in just the last few years but the north campus location is the original and by far the most popular    i have to admit  i am a sucker for square slices  especially ones like detroit style  having had the real thing myself       i can vouch and say this is the real deal  the sauce is a little on the sweet side  but i love the crunch of the crust and how amazing the taste is  especially out of the oven  the toppings are a little on the fancier side than one would find in a classic detroit pizza parlor  but via     expands on the variety and really provides combinations that pack a punch for all palates   there is a soft spot in my stomach for the carnivore  which could probably be more abundant with the toppings  but it is truly the ultimate pizza to order here  right next to it is the ambassador bridge  which is basically the carnivore with added chopped garlic  coming in a very close second is the cadillac  which has gorgonzola  fig preserves  prosciutto di parma  parmesan  and balsamic glaze  a classic is  the detroiter  which includes two kinds of pepperoni  one smoked under the cheese  and naturally very curly pepperoni  it s so good   an   square  aka   x    runs anywhere from         where as the   square  aka  x    is meant to be a personal size runs from        no one ever said pizza was cheap  especially the good stuff  service is usually some college aged person taking orders behind the counter and most of the staff is pretty hip  while it s understandably not open as late as the east side location  i still give this north campus a big thumbs up                   is there such a thing as too many times      "
[4] "                               loved it  loved it  loved it     if you are looking for fabulous tasting gf pizza  this is it   their crust takes like the regular buttery  thick yummy crust that i no longer eat    the other reviews on yelp are true    this place is so good  we look forward to more visits    this place was a hit for the whole family  there was something for everyone   we went to the location close to ut campus on a sunday for dinner when the students were out so there was no wait for dinner time  however  i wouldn t expect that to be the case when school is back in session      "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
[5] "                               ugh i love you via  best pizza i ve ever had hands down  my favorite is the cadillac which has gorgonzola and fig and sometimes i add ricotta to it which makes it super extra yummy   also their ranch    idk what the hell they do to it  but man it s the best ranch i ve ever had  i could drink this ranch   definitely get an order of cheese sticks with ran     "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
[6] "                               detroit styled pizza in austin  not sure what that even means  but sign me up   i m usually not a fan of thick pizza because it leaves me feeling extremely sleepy and heavy  but via     did a great job  we ordered the hawaiian and the carnivore   both delicious  the italian salad was basic and fresh  and we were offered free cheese bread by our server   overall  a great meal      "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                

Now we have clean text data to work with!

Data Solution

Customer Experience Analysis - Via 313 Pizza

Customer Language by Emotional Association

To chart customer experiences by grouping emotional terms, I can use the syuzhet package to group terms according to an associated NRC sentiment dictionary. After summing the frequencies of each emotional category, we can generate a nice bar chart of the customer emotions in Via 313 Pizza reviews.

# Obtain emotion frequency per review
reviewsEmotions <- get_nrc_sentiment(as.character(reviewsCleaned)) 
reviewsEmotionsDF <- t(data.frame(reviewsEmotions))

# Calculate number of reviews with each emotion > 0
reviewsEmotionsDFCount <- data.frame(rownames(reviewsEmotionsDF), 
                                     rowSums(reviewsEmotionsDF > 0))
rownames(reviewsEmotionsDFCount) <- NULL
colnames(reviewsEmotionsDFCount) <- c('Emotion','Frequency')

# Barplot of review sentiment
par(mar = c(3,6,1,1))

barplot(reviewsEmotionsDFCount$Frequency, 
        names.arg = reviewsEmotionsDFCount$Emotion,
        col = "lightblue",
        horiz = TRUE,
        main ="Customer Emotions - Via 313 Pizza", 
        cex.names = 1,
        las = 2)

As we can see from the chart, leading emotions from customer reviews are positive. They signal trust, joy and anticipation when describing their experiences at Via 313 Pizza.

Most Frequently Used Words

Next, we can use the tm package to take a closer look at the most frequent words used in the reviews and generate a bar chart of those terms.

# Create text corpus of reviews
reviews_corp <- Corpus(VectorSource(reviewsCleaned))

# Remove stopwords and create document-term matrix
reviews_DTM <- DocumentTermMatrix(reviews_corp,
                                  control = list(stopwords=T))

# Find frequent terms (cutoff frequency is 50)
findFreqTerms(reviews_DTM, lowfreq = 50)
# Sum up frequencies review and then sort in descending order
reviews_FreqTerms <- sort(rowSums(t(as.matrix(reviews_DTM))), decreasing=TRUE)

# Create data frame with two columns: word and frequency
reviews_FreqTermsDF <- data.frame(word = names(reviews_FreqTerms),
                                  freq = reviews_FreqTerms)
row.names(reviews_FreqTermsDF) <- NULL

# Plot the ten most frequently used words in reviews
barplot(reviews_FreqTermsDF[1:10,]$freq,
        las = 2,
        names.arg = reviews_FreqTermsDF[1:10,]$word,
        col ="lightblue", 
        main ="Most Frequent Words - Via 313 Pizza")

From the chart we can see that 313 Pizza is praised for its detroit-style pizza. People seem to love their crust, cheese, and service. Moreover, it is often praised as a place where customers had a great time or experience.

Word Cloud

Since I already have a word frequency data frame, I can use wordcloud2 to generate a nice visual component to the term frequencies.

# Create word cloud
wordcloud2(data = reviews_FreqTermsDF, size = 1.5)
Via 313 word cloud

Via 313 word cloud

Business Solution and Further

Via 313 Pizza seems to provide customers with a satisfying experience with both good food and a pleasant environment. Their signature detroit-style pizza and outstanding service has helped them earn 4.5 out of 5 stars.

To make this kind of analysis more impactful for the business, I would recommend building snapshots of customer ratings a monthly or even weekly business practice, depending on the availability of updated reviews. Additionally, I would recommend incorporating many more sources of reviews from social media outlets such as Twitter, Facebook, and Google. These modifications can help any business have a good idea of what their customers are talking about and may leverage this information to make a positive change in how they do business with customers.

---
title: "Or, how to understand your customers' experience through mining reviews on Yelp.com"
date: 2019-12-07
tags: [business, nlp, social media mining]
header:
#  image: "/images/enrollmentForecastPic.png"
excerpt: "Business, NLP, Social Media Mining"
output: 
  html_notebook:
    code_folding: hide
---
<center>

![*Via 313 word cloud*](C:/Users/george/Documents/R Assignments/EBB AND FLOW/GeorgeAaronG.github.io/images/pizza analysis/ViaWordcloud2.png)

</center>

&nbsp;

[Yelp.com](https://www.yelp.com/) has been a directory and review go-to for many years.  A four-star app in both [Google Play](https://play.google.com/store/apps/details?id=com.yelp.android&hl=en_US) and the [Apple Store](https://apps.apple.com/us/app/yelp-food-services-around-me/id284910350), the company claims to be home to over 135 million reviews.  

From the perspective of the consumer, positive reviews can lead to increased interest and patronage.  The human phenomenon known as **FOMO** (the Fear Of Missing Out) keeps us alert of what others in our network enjoy taking part in and give us a kind of social incentive for keeping with the group.  In contrast, poor reviews often signal unpleasant experiences from users that we may want to account for when choosing where to spend our money.

From the perspective of the business owner, online reviews have sort of become the digitized "word-of-mouth".  Naturally, the internet has made first-hand experiences with any product or company abundantly available on the web.  [With recent studies showing 95% of shoppers read online reviews before making a purchase](https://spiegel.medill.northwestern.edu/_pdf/Spiegel_Online%20Review_eBook_Jun2017_FINAL.pdf), it's no wonder services like Yelp are becoming increasingly important in customer experience analysis.  Any successful business must inspire loyalty in its customers; to do that we must find a way to tap into the emotions our services evoke in our customers and the language they use when describing our business.

# Business Challenge

> As a business owner, how can I get a snapshot of what my customers are saying on Yelp.com?

In this analysis, I try to provide a data-mining alternative to the manual, labor-intensive process of reading every review.  Of course, there is a time for engaging with your customers and addressing their experiences online and in person.  However, a primary goal of business intelligence is to bring together insightful and actionable information for key decision makers.  To that end, **this project aims to give an overall sense of reviews from Yelp.com for a pizza chain in Austin, TX**.

&nbsp;

# Data Challenge

### Mining Messy Yelp.com Data for Sentiment and Term Frequency

As a broke college student, I must admit that pizza is a go-to for my colleagues and I.  Yelp can help navigate the price points and experiences of competitive pizzerias in the area.  For this customer experience analysis, I will borrow the URL of Yelp.com reviews for Via 313 Pizza in Austin, TX.  I will then attempt to clean the data from multiple pages of reviews to construct a *Document-Term-Matrix* that will be used to provide us with text-based insight.

```{r message =  FALSE}
# Libraries
library(XML)
library(RCurl)
library(ggplot2)
library(syuzhet)
library(tm)
library(SnowballC)
library(wordcloud2)

# Yelp pizzeria URLs
viaURL <- 'https://www.yelp.com/biz/via-313-pizza-north-campus-austin' # 487 Reviews
```

### Cleaning Text Data

Since I will be using the pizzeria comments from Yelp.com, I can use *RCurl* and *XML* to scrape the URLs for reviews and then clean their syntax for a more readable format that can be easily measured.  The reviews are returned in lower case words with no numbers or punctuation.  Here is an example of the first few cleaned reviews.
```{r}
# Set URL
url <- viaURL

# Create empty data frame for reviews
reviewsDF <- data.frame()

# Get data and convert to data frame
reviewsPage <- getURL(url)
parsedReviews <- htmlParse(reviewsPage)
reviews <- xpathSApply(parsedReviews, '//div[@itemtype="http://schema.org/Review"]', xmlValue) 
reviewsDF <- data.frame(reviews)

# Set number of additional pages for more reviews (20 per page)
for (i in 1:2) {
  # Set URL
  url <- paste('https://www.yelp.com/biz/via-313-pizza-north-campus-austin?start=', i*20, sep="")
  reviewsPage <- getURL(url)
  parsedReviews<-htmlParse(reviewsPage)
  reviews <- xpathSApply(parsedReviews, '//div[@itemtype="http://schema.org/Review"]', xmlValue)
  reviews <- data.frame(reviews)
  reviewsDF <- rbind(reviewsDF, reviews)
}

# catch.error() function to check and convert to lower case
catch.error <- function(x)
{
  y <- NA
  catch_error <- tryCatch(tolower(x), error=function(e) e)
  if (!inherits(catch_error, "error"))
    y <- tolower(x)
  return(y)
}

# cleanReviews() function removes superfluous characters
cleanReviews <- function(review){
  review = gsub("\n", " ", review)
  review = gsub("[[:punct:]]", " ", review)
  review = gsub("[[:digit:]]", " ", review)
  review = iconv(review, "UTF-8", "ascii",sub='')
  review = catch.error(review)
  review
}

# cleanReviewsAndRemoveNAs() function removes NA or duplicates
cleanReviewsAndRemoveNAs <- function (allReviews) {
  allReviewsCleaned <- sapply(allReviews, cleanReviews)
  allReviewsCleaned <- allReviewsCleaned[!is.na(allReviewsCleaned)]
  names(allReviewsCleaned) = NULL
  allReviewsCleaned <- unique(allReviewsCleaned)
  allReviewsCleaned
}

# Clean the reviews
reviewsCleaned <- cleanReviewsAndRemoveNAs(reviewsDF)
head(reviewsCleaned)
```
Now we have clean text data to work with!

# Data Solution

## Customer Experience Analysis - Via 313 Pizza

### Customer Language by Emotional Association

To chart customer experiences by grouping emotional terms, I can use the *syuzhet* package to group terms according to an associated *NRC* sentiment dictionary.  After summing the frequencies of each emotional category, we can generate a nice bar chart of the customer emotions in Via 313 Pizza reviews.
```{r, results = FALSE}
# Obtain emotion frequency per review
reviewsEmotions <- get_nrc_sentiment(as.character(reviewsCleaned)) 
reviewsEmotionsDF <- t(data.frame(reviewsEmotions))

# Calculate number of reviews with each emotion > 0
reviewsEmotionsDFCount <- data.frame(rownames(reviewsEmotionsDF), 
                                     rowSums(reviewsEmotionsDF > 0))
rownames(reviewsEmotionsDFCount) <- NULL
colnames(reviewsEmotionsDFCount) <- c('Emotion','Frequency')

# Barplot of review sentiment
par(mar = c(3,6,1,1))

barplot(reviewsEmotionsDFCount$Frequency, 
        names.arg = reviewsEmotionsDFCount$Emotion,
        col = "lightblue",
        horiz = TRUE,
        main ="Customer Emotions - Via 313 Pizza", 
        cex.names = 1,
        las = 2)
```
As we can see from the chart, leading emotions from customer reviews are positive.  They signal trust, joy and anticipation when describing their experiences at Via 313 Pizza.

### Most Frequently Used Words

Next, we can use the *tm* package to take a closer look at the most frequent words used in the reviews and generate a bar chart of those terms.
```{r, results = FALSE}
# Create text corpus of reviews
reviews_corp <- Corpus(VectorSource(reviewsCleaned))

# Remove stopwords and create document-term matrix
reviews_DTM <- DocumentTermMatrix(reviews_corp,
                                  control = list(stopwords=T))

# Find frequent terms (cutoff frequency is 50)
findFreqTerms(reviews_DTM, lowfreq = 50)

# Sum up frequencies review and then sort in descending order
reviews_FreqTerms <- sort(rowSums(t(as.matrix(reviews_DTM))), decreasing=TRUE)

# Create data frame with two columns: word and frequency
reviews_FreqTermsDF <- data.frame(word = names(reviews_FreqTerms),
                                  freq = reviews_FreqTerms)
row.names(reviews_FreqTermsDF) <- NULL

# Plot the ten most frequently used words in reviews
barplot(reviews_FreqTermsDF[1:10,]$freq,
        las = 2,
        names.arg = reviews_FreqTermsDF[1:10,]$word,
        col ="lightblue", 
        main ="Most Frequent Words - Via 313 Pizza")
```
From the chart we can see that 313 Pizza is praised for its detroit-style pizza.  People seem to love their crust, cheese, and service.  Moreover, it is often praised as a place where customers had a great time or experience.

### Word Cloud

Since I already have a word frequency data frame, I can use *wordcloud2* to generate a nice visual component to the term frequencies.
```{r results = FALSE}
# Create word cloud
wordcloud2(data = reviews_FreqTermsDF, shape = 'triangle-forward', size = 1)
```
![*Via 313 word cloud*](C:/Users/george/Documents/R Assignments/EBB AND FLOW/GeorgeAaronG.github.io/images/pizza analysis/ViaWordcloud2.png)

# Business Solution and Further

Via 313 Pizza seems to provide customers with a **satisfying experience with both good food and a pleasant environment.  Their signature detroit-style pizza and outstanding service has helped them earn 4.5 out of 5 stars.**

To make this kind of analysis more impactful for the business, I would recommend building snapshots of customer ratings a monthly or even weekly business practice, depending on the availability of updated reviews.  Additionally, I would recommend incorporating many more sources of reviews from social media outlets such as Twitter, Facebook, and Google.  These modifications can help any business have a good idea of what their customers are talking about and may leverage this information to make a positive change in how they do business with customers.